Efficient Probabilistic Latent Semantic Indexing using Graphics Processing Unit

نویسندگان

Eli Koffi Kouassi

Toshiyuki Amagasa

Hiroyuki Kitagawa

چکیده

In this paper, we propose a scheme to accelerate the Probabilistic Latent Semantic Indexing (PLSI), which is an automated document indexing method based on a statistical latent semantic model, exploiting the high parallelism of Graphics Processing Unit (GPU). Our proposal is composed of three techniques: the first one is to accelerate the Expectation-Maximization (EM) computation by applying GPU matrix-vector multiplication; the second one uses the same principles as the first method, but deals with the sparseness of co-occurrence of words and documents; and the third one is to use the concurrent kernel execution, which is available on NVIDIA Fermi architecture, in order to speed up the process. We compare the performance of the proposed scheme with the non-parallelized implementation. The results show that our method could be more than 100 times faster than the CPU-based implementation in our environment. By dealing with the sparseness of the data, we could not only process more documents and words using GPU, but we could also keep more data on the device memory so that we can avoid massive data copy transfer between the host and the device susceptible to reduce the execution performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

High order pLSA for indexing tagged images

This work presents a method for the efficient indexing of tagged images. Tagged images are a common resource of social networks and occupy a large portion of the social media stream. Their basic characteristic is the co-existence of two heterogeneous information modalities i.e. visual and tag, which refer to the same abstract meaning. This multi-modal nature of tagged images makes their efficie...

متن کامل

Parallel Implementations of Probabilistic Latent Semantic

Probabilistic Latent Semantic Analysis (PLSA) has been successfully applied to many text mining tasks such as retrieval, clustering, summarization, etc. PLSA involves iterative computation for a large number of parameters and may take hours or even days to process a large dataset, thus speeding up PLSA is highly motivated in the domain of text mining. Recently, the general purpose graphic proce...

متن کامل

Accelerating Collapsed Variational Bayesian Inference for Latent Dirichlet Allocation with Nvidia CUDA Compatible Devices

In this paper, we propose an acceleration of collapsed variational Bayesian (CVB) inference for latent Dirichlet allocation (LDA) by using Nvidia CUDA compatible devices. While LDA is an efficient Bayesian multi-topic document model, it requires complicated computations for parameter estimation in comparison with other simpler document models, e.g. probabilistic latent semantic indexing, etc. T...

متن کامل

Massively Parallel Latent Semantic Analyses using a Graphics Processing Unit

Latent Semantic Indexing (LSA) aims to reduce the dimensions of large Term-Document datasets using Singular Value Decomposition. However, with the ever expanding size of data sets, current implementations are not fast enough to quickly and easily compute the results on a standard PC. The Graphics Processing Unit (GPU) can solve some highly parallel problems much faster than the traditional sequ...

متن کامل

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis (pLSA) is a technique from the category of topic models. Its main goal is to model cooccurrence information under a probabilistic framework in order to discover the underlying semantic structure of the data. It was developed in 1999 by Th. Hofmann [7] and it was initially used for text-based applications (such as indexing, retrieval, clustering); however i...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

Efficient Probabilistic Latent Semantic Indexing using Graphics Processing Unit

نویسندگان

چکیده

منابع مشابه

High order pLSA for indexing tagged images

Parallel Implementations of Probabilistic Latent Semantic

Accelerating Collapsed Variational Bayesian Inference for Latent Dirichlet Allocation with Nvidia CUDA Compatible Devices

Massively Parallel Latent Semantic Analyses using a Graphics Processing Unit

Probabilistic Latent Semantic Analysis

عنوان ژورنال:

اشتراک گذاری